Fix issues with chat template application on responses requests by michalkulakowski · Pull Request #4173 · openvinotoolkit/model_server

michalkulakowski · 2026-04-30T12:48:32Z

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

Copilot

Pull request overview

This PR targets improved OpenAI Responses API request handling in OVMS LLM serving so the existing Python/Jinja chat-template path can reliably consume Responses-format inputs (including tool-calling and reasoning-related items).

Changes:

Add debug logging before applying the Python/Jinja chat template.
Extend Responses input parsing to accept additional item/content shapes (reasoning summaries, tool-call items, missing/empty content, output_text).
Build a processedJson payload in chat/completions-style (messages + converted tools) for the Python/Jinja template path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
src/llm/py_jinja_template_processor.cpp	Adds a debug log of the incoming request body before applying the chat template.
src/llm/apis/openai_responses.cpp	Enhances Responses API parsing and constructs chat/completions-compatible `processedJson` including tool conversion and tool-call merging.

dkalinowski · 2026-05-12T13:06:40Z

        return false;
    }
-
+    SPDLOG_DEBUG("Before chat template: \n {}", requestBody);


do we want to keep it?

@michalkulakowski

dkalinowski · 2026-05-12T13:07:37Z

            } catch (const std::exception& e) {
                SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Failed to apply chat template: {}", e.what());
-                return absl::Status(absl::StatusCode::kInvalidArgument, "Failed to apply chat template. The model either does not have chat template or has an invalid one.");
+                return absl::Status(absl::StatusCode::kInvalidArgument, absl::StrCat("Failed to apply chat template: ", e.what()));


do we want to expose call stack to the user?

@michalkulakowski

przepeck · 2026-05-14T08:46:47Z

 mkdir -p ${HOME}/models
 docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v ${HOME}/models:/models --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \
--rest_port 8000 --model_repository_path /models --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --tool_parser hermes3 --target_device GPU --task text_generation --pipeline_type VLM_CB --allowed_media_domains raw.githubusercontent.com
+--rest_port 8122 --model_repository_path /models --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --model_name ovms-model --tool_parser hermes3 --target_device GPU --task text_generation --pipeline_type VLM_CB --allowed_media_domains raw.githubusercontent.com


I think this change is unintended

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/test/http_openai_handler_test.cpp:3902

Same issue as parseResponses(): EXPECT_FALSE on parse errors is non-fatal, so the helper can proceed to construct/parse a handler against an invalid document and return a status that may not reflect the original failure clearly. Consider using ASSERT_FALSE for JSON parsing failures (or returning an explicit InvalidArgument status when doc.HasParseError() is true).

    doc.Parse(json.c_str());
    EXPECT_FALSE(doc.HasParseError()) << json;
    std::optional<uint32_t> maxTokensLimit;
    uint32_t bestOfLimit = 0;
    std::optional<uint32_t> maxModelLength;
    auto apiHandler = std::make_shared<ovms::OpenAIResponsesHandler>(
        doc, ovms::Endpoint::RESPONSES, std::chrono::system_clock::now(), tokenizer);
    return apiHandler->parseRequest(maxTokensLimit, bestOfLimit, maxModelLength);

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

src/test/http_openai_handler_test.cpp:3915

parseResponses() returns nullptr after ADD_FAILURE() when parsing fails; downstream helpers/tests (e.g., expectResponsesEquivalentToChatCompletions) dereference the returned pointer without asserting non-null, which can crash the test and obscure the real failure. Add an ASSERT_NE(apiHandler, nullptr) at the start of consumers or refactor the helper to fail fatally.

    doc.Parse(json.c_str());
    if (doc.HasParseError()) {
        ADD_FAILURE() << "Failed to parse JSON: " << json;
        return nullptr;
    }

src/llm/apis/openai_responses.cpp:550

ProcessedJsonSink::emitStandaloneReasoning() omits the "content" field entirely. Templates that access message.content unconditionally will fail on such messages. Emit "content": "" (empty string) for standalone reasoning turns to keep processedJson compatible with a wider set of templates.

    void emitStandaloneReasoning(const std::string& reasoning) {
        rapidjson::Value msgObj(rapidjson::kObjectType);
        msgObj.AddMember("role", rapidjson::Value("assistant", alloc), alloc);
        msgObj.AddMember("reasoning_content", rapidjson::Value(reasoning.c_str(), alloc), alloc);
        messagesArray.PushBack(msgObj, alloc);

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/llm/apis/openai_responses.cpp:660

parseInput() logs “Parsed responses input … without mutating request JSON”, but parseResponsesPart() now mutates doc before calling parseInput (tools normalization and potentially adding chat_template_kwargs). This debug message is misleading; consider rewording it or moving the tools normalization to after parseInput so the statement remains true.

    } else {
        return absl::InvalidArgumentError("input is not a string or array");
    }

    SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Parsed responses input directly to chat history without mutating request JSON");
    return absl::OkStatus();

src/test/http_openai_handler_test.cpp:708

This comment says “For Responses, processedJson is always built from chatHistory”, but the updated Responses implementation now builds processedJson directly from the original input (including reasoning/function_call handling) rather than re-serializing chatHistory. Please update the comment to reflect the current contract to avoid confusion when maintaining these tests.

    ASSERT_NE(apiHandler, nullptr);

    // For Responses, processedJson is always built from chatHistory.
    // For chat/completions with simple text, processedJson is empty (original body is used instead).
    // In both cases, the chatHistory should be equivalent.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings April 30, 2026 12:48

Copilot started reviewing on behalf of michalkulakowski April 30, 2026 12:49 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

michalkulakowski force-pushed the mkulakow/responses_fix_2 branch from 035101b to 3ac1cf1 Compare May 6, 2026 13:22

dtrawins added this to the 2026.2_rc milestone May 8, 2026

michalkulakowski force-pushed the mkulakow/responses_fix_2 branch 3 times, most recently from 773cb1a to 0aa9a48 Compare May 12, 2026 11:53

dkalinowski reviewed May 12, 2026

View reviewed changes

michalkulakowski force-pushed the mkulakow/responses_fix_2 branch from 0aa9a48 to 2153f52 Compare May 12, 2026 13:12

michalkulakowski added the Code Freeze label May 12, 2026

dkalinowski approved these changes May 13, 2026

View reviewed changes

michalkulakowski force-pushed the mkulakow/responses_fix_2 branch from 2153f52 to 99b08c3 Compare May 13, 2026 11:23

przepeck reviewed May 14, 2026

View reviewed changes

michalkulakowski force-pushed the mkulakow/responses_fix_2 branch from ed876b5 to 6a0baef Compare May 14, 2026 09:44

michalkulakowski requested a review from przepeck May 14, 2026 11:48

przepeck approved these changes May 15, 2026

View reviewed changes

michalkulakowski requested a review from Copilot May 15, 2026 12:26

Copilot started reviewing on behalf of michalkulakowski May 15, 2026 12:27 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread src/llm/visual_language_model/continuous_batching/servable.cpp Outdated

Comment thread src/llm/apis/openai_responses.cpp

Comment thread src/llm/apis/openai_responses.cpp Outdated

Comment thread src/test/http_openai_handler_test.cpp

michalkulakowski requested a review from Copilot May 15, 2026 14:26

Copilot started reviewing on behalf of michalkulakowski May 15, 2026 14:27 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread src/test/http_openai_handler_test.cpp

Comment thread src/test/http_openai_handler_test.cpp

Comment thread src/llm/apis/openai_responses.cpp Outdated

Comment thread src/llm/apis/openai_responses.cpp Outdated

Comment thread src/llm/apis/openai_responses.cpp

michalkulakowski requested a review from Copilot May 18, 2026 07:47

Copilot started reviewing on behalf of michalkulakowski May 18, 2026 07:47 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/llm/apis/openai_responses.cpp Outdated

Comment thread src/llm/apis/openai_responses.cpp Outdated

Comment thread src/llm/apis/openai_responses.cpp Outdated

Comment thread src/llm/servable.cpp Outdated

mkulakow added 3 commits May 18, 2026 10:24

Support functions in responses api

7993047

uts

3f0a858

Update tests

041265c

fix

221e0ce

michalkulakowski requested a review from Copilot May 18, 2026 08:43

Copilot started reviewing on behalf of michalkulakowski May 18, 2026 08:45 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/llm/servable.cpp Outdated

fix

250e3ca

michalkulakowski requested a review from Copilot May 18, 2026 09:20

Copilot AI reviewed May 18, 2026

View reviewed changes

fix

5e25882

michalkulakowski requested a review from Copilot May 18, 2026 09:39

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/llm/visual_language_model/continuous_batching/servable.cpp

Comment thread src/llm/apis/openai_responses.cpp

Comment thread src/llm/apis/openai_responses.cpp

Copilot started reviewing on behalf of michalkulakowski May 18, 2026 10:01 View session

Copilot started reviewing on behalf of michalkulakowski May 18, 2026 10:09 View session

fix

7f3328a

michalkulakowski requested a review from Copilot May 18, 2026 11:13

Copilot started reviewing on behalf of michalkulakowski May 18, 2026 11:14 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/llm/apis/openai_responses.cpp Outdated

update comment

55f7076

michalkulakowski requested a review from Copilot May 18, 2026 11:21

Copilot started reviewing on behalf of michalkulakowski May 18, 2026 11:22 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

fix demo

ac7f667

michalkulakowski requested a review from Copilot May 18, 2026 11:55

Copilot started reviewing on behalf of michalkulakowski May 18, 2026 11:56 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/llm/apis/openai_responses.cpp Outdated

fix

5978a63

michalkulakowski requested a review from Copilot May 18, 2026 12:33

Copilot started reviewing on behalf of michalkulakowski May 18, 2026 12:34 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

style

e836d74

michalkulakowski merged commit f386355 into main May 19, 2026
1 check passed

Conversation

michalkulakowski commented Apr 30, 2026

🛠 Summary

🧪 Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dkalinowski May 12, 2026

Choose a reason for hiding this comment

Uh oh!

dkalinowski May 13, 2026

Choose a reason for hiding this comment

Uh oh!

dkalinowski May 12, 2026

Choose a reason for hiding this comment

Uh oh!

dkalinowski May 13, 2026

Choose a reason for hiding this comment

Uh oh!

przepeck May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment